So here in this scenario of autonomus car driving we need to use the box2d of the OpenAi Gym,so for installing those depenendencies we have to write the following conda/pip installs as mentioned below,as we will be working in 2D environment we need to install the box2d,so below is the code for installing it ↓
# if you're working on anaconda prompt,then directly install these on you're conda console,or you can simply install
# them using the jupyter notebook by simply adding the "!pip install box2d",and "!pip install swig" if working from jupyter!
#conda install swig
#pip install box2d
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import VecFrameStack,DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
import os
envname = "CarRacing-v0"
env = gym.make(envname)
env.action_space
env.observation_space
episodes = 2
for episode in range(1,episodes+1):
state = env.reset()
done = False
score = 0
while not done:
env.render()
action = env.action_space.sample()
n_state,reward,done,info = env.step(action)
score+=reward
print(f'Episode : {episode},score:{score}')
env.close()
So as we can observe from the action space that we're having a Box space and in the observation space that we're having the RGB image,so here we will be again using the CNN model Policy and the algorithm from the algo chart we'll be using the PPO algo and the policy used will be the CNNPolicy as we're dealing with RGB images!
log_path = os.path.join("Training",'Logs')
model = PPO('CnnPolicy',env,verbose = 1,tensorboard_log=log_path)
model.learn(total_timesteps = 40000)
All the same as we have done in case of previous environments,inorder to monitor our training and validation metrics we save out our logs,and then we can monitor our logs on the tensorboard platform by running the tensorboard --logdir=<log_directory>,here log_directory is the one where you have saved you're training and validation logs,one can browse out from there!,this will open the localhost:6000,where one can browse through!,then we train our model for 10000 time steps,for more robust model you can increase you're training time to a 1M or even more!
ppo_path = os.path.join('Training','Saved Models','PPO_Driving_model')
model.save(ppo_path)
evaluate_policy(model,env,n_eval_episodes = 10,render = True)
env.close()
## Testing the model,where it tries to predict it's own actions
episodes = 3
for episode in range(1,episodes+1):
obs = env.reset()
done = False
score = 0
while not done:
env.render()
actions,_ = model.predict(obs)
obs,reward,done,info = env.step(actions)
score+=reward
print(f' Episode : {episode},Score : {score}')
env.close()
As we can see that if we run the car for 10000 iterations,then the agent has learnt almost nothing,as we can see from the image below it makes it clear that this agent requires more time to learn the environment,other possible hack is may be one can use different policies,that also works fine,so this was the result when the agent was trained for 10000 iterations